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ABSTRACT 



Hard thresholding, LASSO , adaptive LASSO and SCAD point estimators have 
been suggested for use in the linear regression context when most of the compo- 
nents of the regression parameter vector are believed to be zero, a sparsity type 
of assumption. Potscher and Schneider, 2010, Electronic Journal of Statistics, have 
considered the properties of fixed- width confidence intervals that include one of these 
point estimators (for all possible data values). They consider a normal linear re- 
gression model with orthogonal regressors and show that these confidence intervals 
are longer than the standard confidence interval (based on the maximum likelihood 
estimator) when the tuning parameter for these point estimators is chosen to lead to 
either conservative or consistent model selection. We extend this analysis to the case 
of variable-width confidence intervals that include one of these point estimators (for 
all possible data values). In consonance with these findings of Potscher and Schnei- 
der, we find that these confidence intervals perform poorly by comparison with the 
standard confidence interval, when the tuning parameter for these point estimators 
is chosen to lead to consistent model selection. However, when the tuning parameter 
for these point estimators is chosen to lead to conservative model selection, our con- 
clusions differ from those of Potscher and Schneider. We consider the variable- width 
confidence intervals of Farchione and Kabaila, 2008, Statistics & Probability Let- 
ters, which have advantages over the standard confidence interval in the context that 
there is a belief in a sparsity type of assumption. These variable-width confidence 
intervals are shown to include the hard thresholding, LASSO, adaptive LASSO and 
SCAD estimators (for all possible data values) provided that the tuning parameters 
for these estimators are chosen to belong to an appropriate interval. 
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1 Introduction 



Hard-thresholding, LASSO (Tibshirani [7]), adaptive LASSO (Zou [8j) and SCAD 
(Fan and Li [TJ) point estimators have been suggested for use in the linear regres- 
sion context when most of the components of the regression parameter vector are 
believed to be zero, a sparsity type of assumption. Potscher and Schneider [S] ask 
to what extent these point estimators can be used as the basis for confidence inter- 
vals for these components. They consider the properties of fixed-width confidence 
intervals that are constrained to include one of these point estimators (for all pos- 
sible data values). They do this in the context of a normal linear regression model 
with orthogonal regressors for both the case that (a) the error variance is assumed 
known and (b) the error variance is estimated by the usual unbiased estimator ob- 
tained by fitting the full model to the data. Potscher and Schneider j5] show that 
these confidence intervals are longer than the standard confidence interval based 
on the maximum likelihood estimator, when the tuning parameter for these point 
estimators is chosen to lead to either conservative or consistent model selection. 
By consistent model selection, we mean that the selected model is the true model 
with probability approaching 1 as n — > oo, where n denotes the dimension of the 
response vector. By conservative model selection, we mean a model selection that 
(a) is not consistent and (b) is such that the selected model includes the true model 
with probability approaching 1 as n — > oo. 

To what extent are these findings due to the requirement that these confidence 
intervals have fixed widths? A variable-width confidence interval based on a given 
point estimator has the property that this confidence interval includes this point 
estimator, for all possible data values. We first consider the case that the tuning 
parameter for these point estimators is chosen to lead to consistent model selection. 
In Section 3, we present a new result that shows that variable- width confidence 
intervals that include one of these point estimators (for all possible data values) 
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must perform poorly by comparison with the standard confidence interval. In this 
case, our conclusions are similar to those in [5]. This is perhaps not surprising, given 
the results of Kabaila [3] and Potscher [I] . 

Next, we consider the case that the tuning parameter for these point estimators is 
chosen to lead to conservative model selection. Potscher and Schneider [5] find that 
fixed-width confidence intervals that are constrained to include one of these point 
estimators (for all possible data values) are longer than the standard confidence 
interval. This may be interpreted as a negative finding for these point estimators. 
Yet, these point estimators have some very attractive features. Figure 9 of [7] 
shows contours of constant value of \Pi\ q + \P2\ q for q = 4,2,1,0.5 and 0.1. As 
Tibshirani [7j states, "The lasso corresponds to q = 1." and "The value q = 1 has 
the advantage of being closer to subset selection than is ridge regression (q = 2) and 
is also the smallest value of q giving a convex region.". The LASSO estimator has 
the attractive feature that it is a continuous function of the data. Like the LASSO, 
the adaptive LASSO and the SCAD estimators use a thresholding rule that sets 
estimated coefficients with small magnitudes to zero. The adaptive LASSO and 
the SCAD estimators also have the attractive features that (a) they are continuous 
functions of the data and (b) they are nearly unbiased when the true unknown 
parameter has large magnitude ([I], [8]). How do we resolve the apparent conflict 
between the findings of [5J and the existence of these very attractive features? We 
show that this finding can be explained (at least in part) by the requirement in [5] 
that the confidence intervals have fixed widths. 

Following [5], we consider a normal linear regression model with orthogonal re- 
gressors for both the case that (a) the error variance is assumed known and (b) the 
error variance is estimated by the usual unbiased estimator obtained by fitting the 
full model to the data. It is plausible that the case that the error variance is known 
amounts essentially to the assumption that the error variance is estimated with great 
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accuracy. In Appendix B, we provide a precise motivation for considering the known 
error variance case. In Section 4, we consider the variable-width confidence intervals 
of Farchione and Kabaila [2j, in the known error variance case. These confidence 
intervals are shown to have advantages over the standard confidence interval when 
there is a belief in a sparsity type of assumption. These variable-width confidence 
intervals are shown to include the hard-thresholding, LASSO, adaptive LASSO and 
SCAD estimators (for all possible data values) provided that the tuning parameters 
for these estimators are chosen to belong to an appropriate interval. In Section 
5, we consider the extension of these results to the case that the error variance is 
estimated by the usual unbiased estimator obtained by fitting the full model to the 
data. 

2 The model and the point estimators considered 

We consider a normal linear regression model with orthogonal regressors. As pointed 
out in [5], without loss of generality we may suppose that the data Yi, . . . , Y n are 
independent and identically N(8, a 2 ) distributed, where 6 6 R and a > 0. We use 
lower case to denote the observed value of a random variable. We also use a similar 
notation to that used in [5] for the hard thresholding, LASSO and adaptive LASSO 
estimators. Namely, the hard thresholding estimator is given by 



where the tuning parameter r] n is a positive real number, Y = n 1 Y^=i Y i an d 
S 2 = ( n - l)- 1 J21=i( Y i ~ Y Y- The LASSO estimator Q s is given by 

{-max{|?|-i>7 n ,0} if Y < 
if Y = 

max{|F| - Sr/ n ,0} if Y > 
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where sign(x) is equal to —1 for x < 0, for x = and 1 for x > and x + = 
max{x,0}. The adaptive LASSO estimator 9^ is given by 

r o if \?\ < ± Vn 

®A = Y{l-tWjY 2 ) + = { ±2 2 a 

[y-— 1 if |K| > Ktm 
We also consider the following SCAD estimator 6c 

fsignCP) (\Y\ - t Vn ) + if \Y\ <2t Vn 

©c = < ((a - l)Y - sign(F)aS^)/(a - 2) if 2± Vn < \Y\ < a±r] n 
(Y if \Y\ > atr] n 

where a = 3.7 (see p. 1351 of [I] for a motivation for this choice of a). 

3 Variable- width confidence intervals based on the 
point estimators when the tuning parameter is 
chosen for consistent model selection 

In this section, we suppose that r\ n — > and y/nr] n — > oo, as n — > oo. In other words, 
we suppose that the tuning parameter r\ n is chosen so as to lead to consistent model 
selection. In this case, for example, the probability that 0# is equal to approaches 
1 for 8 = 0, whilst G# converges in probability to 9 for 9 ^ (as n — > oo). For 
clarity, in this section we will use the subscript n to make explicit a dependence on 
n. Let 9 n (y n ,a n ) denote a point estimate of 9 that satisfies the condition that if 
\Vn\ ^ PnVn then 9 n (y n , a n ) = 0. The estimates 9h, 9s and 9a satisfy this condition. 
With a small change of notation, the estimate 9c also satisfies this condition. The 
standard 1 — a confidence interval for 9 is 



Jn 



Y n - t(n - l)S n / y/n, Y n + t(n - l)E n /Vn 



where the quantile t(m) is defined by the requirement that P(—t(m) <T< t(m)) = 
1 — a for T ~ t m . 

A variable-width confidence interval based on the point estimate 9 n (y n ,a n ) has 
the property that this confidence interval includes this point estimate, for all possible 
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data values. Consider the confidence interval 

D n (Y n , E n ) = [£ n (Y n ,Y, n ), u n (Y n , S n )J 
for 9, that is required to satisfy the following conditions for all n: 



(a) O n (y n ,cr n ) G D n (y n ,a n ) for all (y n ,a n ) G R x (0,oo). In other words, the 
confidence interval -D n contains the estimate 9 n , for all possible data values. 

(b) Pe, a {0 G D n (Y n , £„)) > 1 - or for all (0, cr) G K x (0, oo). In other words, D n 
is a 1 — a confidence interval for 9. 

The following result shows that this confidence interval performs very poorly by 
comparison with J n , the standard 1 — a confidence interval for 9. 

Theorem 1. Let 9 n = arj n /2. For each a G (0, oo), 



as n — > oo. 

The proof of this theorem is presented in Appendix A. 

4 Variable- width confidence intervals of Farchione 
and Kabaila when the error variance is known 

Consider the "known error variance case" . The motivation for considering this case 
is given in Appendix B. Suppose that a 2 is known. Consider the 1 — a confidence 
interval for 9, put forward by Farchione and Kabaila [2], that has the form 



where the function b satisfies b(x) > —b(—x) for all x G M. This constraint is 
required to ensure that the upper endpoint of this confidence interval is never less 



E 9nt( r (length o/D n (F n ,E n )) 



— > oo 



Ee n)CT (length of standard I — a confidence interval J, 




(1) 
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than the lower endpoint. This particular form of confidence interval is motivated by 
the invariance arguments presented in Section 4 of [2]. The standard I — a confidence 
interval for 9 is I = \Y — za/y/n, Y + za/\/n\ , where the quantile z is defined by 
the requirement that P(—z <Z<z) = l — a for Z ~ N(0, 1). Note that this 
confidence interval can be expressed in the form C. 

The coverage probability and expected length properties of the confidence inter- 
val C are conveniently examined by applying the same change of scale (by multiply- 
ing by y/n/a) to the parameter 9, the estimator Y, the confidence interval C and 
the standard confidence interval /. Define if) = (y/n/a)9, X = (y/n/a)Y, 

cr = ^c=[-b{-x),b{X)], (2) 

and I* = {\/n/a)I = [X - z, X + z\. Note that X ~ N(ip, 1). We consider C* to be 
a confidence interval for ip, based on X. The standard 1 — a confidence interval for 
i) (based on X) is I*. Note that P 0)<r [9 E C) = P^(tp E C*) and 

E e ^ (length of C) _ £^ (length of C*) 
length of / length of /* ' 

for if; = (\/n/a)9. 

Following [2j, we assess C*, for parameter value if), using the relative efficiency 

= / ^(length of C*) V = / ^(length of C*) 
e[ -^' V length of/* J V 2z 

This is a measure of the efficiency of the standard 1 — a confidence interval I* by 
comparison with the efficiency of the 1 — a confidence interval C*. The relative 
efficiency e{ip) is the ratio (sample size used for C*)/(sample size used for I*) such 
that -E,/, (length of C*) = length of I* (cf p. 555 of [6]). Farchione and Kabaila [2] use 
the methodology of Pratt [6], with a new weight function determined by a parameter 
w, to find a confidence interval C* such that e(0) is minimized, while ensuring that 
max^ e(if)) is not too large. In other words, if if) happens to be then C* performs 
better than the standard 1 — a confidence interval /*. On the other hand, if if) ^ 
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then the worst possible performance of C* is max^e(^), which is not too large. In 
addition, this confidence interval has endpoints that approach the endpoints of the 
standard 1 — a confidence interval I* as \x\ — > oo. This implies that e(ip) — > 1 as 
oo. We have chosen w = 0.1 and 1 — a = 0.95. The coverage probability 
Ptp{ip G C*) is 0.95 for all ip. The relative efficiency e(ip) of C* for this case is shown 
in Figure 1. For comparison, the 0.95 confidence interval described on p. 555 of [6] 
has relative efficiency 0.72 at ip = 0. This, however, comes at the very high cost of 
the relative efficiency diverging to oo as \ip\ — > oo. 

1 .25, 1 1 1 1 1 1 




-15 -10 -5 5 10 15 

i> 

Figure 1: Plot of the efficiency of the standard 95% confidence interval by comparison 
with the Farchione and Kabaila 95% confidence interval (for w = 0.1,) as a function ofip. 

We now consider the properties of the confidence interval C* in the context that 
most of the components of the regression parameter vector are believed to be zero, 
a sparsity type of assumption. Firstly, suppose that a large majority of the compo- 
nents of the regression parameter vector are zero. In this case, C* compares very 
favourably with the standard 1 — a confidence interval. If ip — 0, corresponding to 
one of the large majority of the components of the regression parameter vector that 
are zero, then e{ip) is approximately 0.8. On the other hand, if ip 7^ 0, correspond- 
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ing to one of the small minority of components of the regression parameter vector 
that are non-zero, then the maximum possible value of e(ip) is approximately 1.2. 
Secondly, in the "best of all possible worlds" scenario that a large majority of the 
components of the regression parameter vector are zero and the remaining compo- 
nents have large magnitudes, C* may be said to effectively dominate the standard 
1 — a confidence interval. If ip = 0, corresponding to one of the large majority of 
the components of the regression parameter vector that is zero, then e(ip) is approx- 
imately equal to 0.8. On the other hand, if is large, corresponding to one of 
the small minority of the components of the regression parameter vector that has 
large magnitude, then e{ip) is approximately equal to 1. We conclude that C* has 
advantages over the standard 1 — a confidence interval I* when a sparsity type of 
assumption holds. 

5 Variable- width confidence intervals based on the 
point estimators when the tuning parameter is 
chosen for conservative model selection and the 
error variance is known 

In this section, we suppose that r\ n — » 0. We also suppose that there exists a positive 
integer N and at and a u (satisfying < ag < a u < oo), such that \phr\ n e [ag, a u ] 
for all n > N. This includes the particular case that y/nrj n — > a (0 < a < oo), as 
n — > oo. In other words, we suppose that the tuning parameter r\ n is chosen so as to 
lead to conservative model selection. We consider the "known error variance case" . 
The motivation for considering this case is given in Appendix B. 

Suppose that a 2 is known. We consider the conditions under which the point 
estimate 



{I 


if 


\y\ 


<VVn 




if 


\v\ 





of 6 belongs in the confidence interval C (defined by ([T})) for all (y, u)6lx (0, oo). 
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Define r n = \fnr\ n . As in Section 4, multiply the estimate 9h and the confidence 
interval C by y/n/ a, to obtain 



7 y/n* I if \x\ < T n 

O" I X it \X\ > T n 

and C* = {\/n/a)C (see ©). Obviously § H G C for all (y,a) G R x (0, oo) is 
equivalent to ^ G C* for all x G R. There exists a positive number cjj such 
that, for every r n G (0, c^-], the following is true: G C* for all x G R. Similar 
statements hold for the other point estimates §s, Oa and 9c (the corresponding 
estimators are defined towards the end of Appendix B). 

Define ips = (y/n/a)9s, $a = (\/n/a)§A and ipc = (vW°")^c*- We have com- 
puted the maximum values of r n such that ipH, ^s, ^a and tyc are in the interval 
C* (for all x). In each case this maximum value was found to be 1.96. Figures 2 
and 3 show the values of the estimator as a function of x for this maximum value, 
together with the endpoints of the confidence interval C* as functions of x. 
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; estimator 
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Lower endpoint of interval 
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Figure 2: The left and right panels show the hard-thresholding estimate i/jh o- n d the LASSO 
estimate ips> respectively, as functions of x (for r n = 1.96,). Also shown, in both panels, is 
the Farchione and Kabaila 95% confidence interval C* as a function of x (for w = O.lj. 
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Figure 3: The left and right panels show the Adaptive LASSO estimate ipA (f or T n = 1-96 j 
and the SCAD estimate ips (for r n = 1.96, a = 3.7j ; respectively, as functions of x . Also 
shown, in both panels, is the Farchione and Kabaila 95% confidence interval C* as a 
function of x (for w = 0.1). . 

6 Variable- width confidence intervals of Farchione 
and Kabaila and the point estimators when the 
tuning parameter is chosen for conservative model 
selection and the error variance is unknown 

Suppose that the error variance a 2 is unknown. Consider the 1 — a confidence 
interval for 8, put forward in Section 5 of [2], that has the form 



D 



where the function b satisfies b(x) > —b(—x) for all 168. This constraint is required 
to ensure that the upper endpoint of this confidence interval is never less than the 
lower endpoint. This particular form of confidence interval can be motivated by 
invariance arguments similar to those presented in Section 4 of [2j. The standard 
1 — a confidence interval for 9 is 

J = [Y - t(n - l)t/y/n, Y + t(n- 1)E/Vn] 
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where the quantile t(m) is defined by the requirement that P(—t(m) < T < t(m)) = 
1 — a for T ~ t m . Note that this confidence interval can be expressed in the form 
D. 

Define R = X/o~. The coverage probability and expected length properties of 
the confidence interval D are conveniently examined by applying the same change of 
scale (by multiplying by y/n/a) to the parameter 6, the estimator Y, the confidence 
interval D and the standard confidence interval J. Define if) = (^/n/a)6, X = 

{y/n/v)Y, 

D* = = [-Rb(-X/R), Rb{X/R)]. 

and J* = (y/n/a)J = [X - t(n - l)R,X + t(n - 1)R). Note that X and R are 
independent random variables and that X ~ N(ip, 1). As noted in Appendix B, the 
coverage probability and expected length properties of D are conveniently evaluated 
using the fact that 

and 

^(length of D) _ ^(length of D*) 
E e<tT (length of J) ~ E e<tT (length of J*) 

for ip = (y/n/a)9. 

Following [2J, we assess D*, for parameter value if), using the relative efficiency 

/ ^(length of £>*) \ 2 = / ^(length of D*) \ 2 
6W \E 6ja (length of J*) J \ 2t(n)E(R) J ' 

This is a measure of the efficiency of the standard 1 — a confidence interval J* 
by comparison with the efficiency of the 1 — a confidence interval D* . Farchione 
and Kabaila [2] present (in Section 6) a computational methodology with a weight 
function determined by a parameter w, to find a confidence interval D* such that 
e(0) is minimized, while ensuring that max^ is not too large. In other words, 
if ip happens to be then D* performs better than the standard 1 — a confidence 
interval J*. On the other hand, if ip ^ then the worst possible performance of 
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D* is max^ e(ijj), which is not too large. In addition, the confidence interval D* has 
endpoints that are the same as the endpoints of the standard I— a confidence interval 
J* for sufficiently large \X\/R. This implies that e{ip) — > 1 as — > oo. Farchione 
and Kabaila [2] found computationally that for the same choice of parameter w, 
the confidence intervals C* and D* have similar relative efficiencies (as function of 
ip), provided that n is not small. This is illustrated by Figure 2 of [2j. Theoretical 
support for this computational finding is provided by Theorem 2 of Appendix B of 
the present paper. 

As in Section 5, suppose that the tuning parameter rj n is chosen so as to lead 
to conservative model selection. We consider the conditions under which the point 
estimate 



{: 


if 


\v\ 


<ar] n 




if 


\y\ 





of 6 belongs in the confidence interval D (observed value) for all (y, ff) six (0, oo). 
Define r n = y/nr] n . Multiply the estimate 6n and the confidence interval D by 
-y/n/a, to obtain 

{0 if \x\ < r n 
x if \x\ > r n 

[-b(-x), &(£)], 

where x = (y/n/a)y. Obviously, 9 H E D for all (y, a) G M. x (0, oo) is equivalent 
to ipH € D for all x G M.. There exists a positive number ch such that, for every 
r n G (0, ch], the following is true: ipn £ D for all x. Similar statements hold for the 
other point estimates 9s, 6a and 6c- As note earlier, the computational results of 
[2] and Theorem 2 of Appendix B, suggest that (provided that n is not small) the 
situation here is very similar to that described in Section 5 and Figures 2 and 3. In 
other words, we expect that ch ~ ch (where ch is defined in Section 5), provided 
that n is not small. 
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iPh = ^6h 
a 



D = ^D 
a 



7 Conclusion 



The results of this paper confirm, yet again, that the hard-thresholding, LASSO, 
adaptive LASSO and SCAD point estimators form a very poor foundation for confi- 
dence interval construction when the tuning parameter for these estimators is chosen 
to lead to consistent model selection. However, the results of this paper do not, by 
any means, rule out the use of these point estimators as the foundation for confi- 
dence interval construction when the tuning parameter for these estimators is chosen 
to lead to conservative model selection. 



Pe,*{{6eD n {Y n ,± n )}C\A n )+P , a ({6eD n {Y n ,£ n )}C\A c n ) >l-a for all (9, a). 



Pe n ,c{{O n e D n (Y n , t n )} n A n ) + Pe n ,a({O n e D n (Y n , t n )} n A c n ) > 1 - a for all a. 



A Proof of Theorem 1 



Define the event A n — {\Y n \ < S n 7] n }. By the law of total probability, 



In particular, 



Define the event B n = [u n (Y n , E n ) > 9 n j. When the event A n occurs, £ n (Y n , S n ) < 



and so 



P dn ,a({0n e D n (Y n , £„)} n A n ) = Pe n , a (B n n A n ) for all a. 



Thus, for each a G (0, oo), 



Pe n , a {B n n4)>l-a- Pe n ,a{{0 n e D n (Y n , ± n )} n A c n ) 
>l-a-P en , a (A c n ). 



Lemma 1. For each a £ (0, oo), Pe n:(T (A c n ) — > as n — > oo. 



Proof. Fix <r E (0, oo). It is sufficient to prove that Po n:CT (A n ) — > 1 as n — > oo. Now 



A 



n 
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where X n = ^paY n jo. Note that X n ~ N(y/nr) n /2, 1). Observe that 

1 



— > - > n 

a 4 



X„ 



< ^V^n j C An. 



Thus 



Pe n ,Mn) > Pe n „ [{ ^ > 1} n 



t>n,<? 



S ra 3 
cr 4 



and the right-hand-side converges to 1 as n — > oo. 



□ 



Also, when the event B n fl A n occurs, £ n (Y n , S n ) < and u n (Y n , S„) > 9 n , so that 
S n ) - 4(^«, S n ) > 6 n . Hence, 

^ n)ff (length of D n (Y n , £„)) > P„„ 1<y (S n n A) n . 



Thus, for each a G (0, oo), 

E 6nia (length of £„)) 



> 



Pe„,a(B n fl A n ) 



Ed„,a (length of standard 1 - a CI for 0) 2 i(n - l)£(£ n ) /^n 



4t(n-l)£(E n /a) 



which tends to infinity as n -)■ oo. 

B The motivation for considering the known er- 
ror variance case 



In this appendix, we motivate the consideration of the "known error variance case" . 
We begin by supposing that the error variance cr 2 is unknown and is 

estimated by a 2 . We apply the same change of scale (by multiplying by \fnja) to 
the parameter 9, the estimator Y and the estimators 0#, 0s, 0a and ©c as follows. 
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Define tp = (y/n/a) 9, X = (y/n/cr) Y, r n = \pnr\ n and R = S/cr. Note that X and 
R are independent random variables and that X ~ N(tp, 1). Also define 





if 


\x\ 


< Rr n 




if 


\x\ 


> RT n 



a 



-max{|X| - Rr n ,0} if X < 

77/ ~ 1 

^5 = —©5 = < if X = 



(7 



max{|X| -i?r n ,0} if X > 



if \X\ < Rt, 



a 



X - if |X| > i?r n 



sign(X)(|X| - i?r„) + if |X| < 2Rr n 

q c = y—Q c = } (( a - 1)X - sign(X) a Rr n ) /(a - 2) if 2Rr n < \X\ < aRr n 
(X if |X| > aRr n 

These are not estimators of ip since they depend on the unknown parameter a. 

Since R and X are independent and R converges in probability to 1 (as n — >■ oo) it 

is plausible that, for large n, the statistical properties of ^ a and are 

well-approximated by these properties of the corresponding quantities: 



H 





if 


\x\ 


< T n 


I* 


if 


\x\ 


> T n 



'-max{|X| - r n ,0} if X < 
tf'.s' = 4 if X = 

max{|X| -r n ,0} if X > 



if |X| < r n 

= < r 2 

X - if |X| > r n 



'sign(X)(|X|-r n ) + if |X| < 2r n 

V c = <J ((a - 1)X - sign(X)ar n )/(a - 2) if 2r„ < |X| < ar n 
X if 1X1 > ar„ 
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Note that, conveniently, the statistical properties of these quantities depend only on 
the parameter if> and not on the parameter a. 

Farchione and Kabaila [2] consider the following confidence interval for 9: 



D 



where the function b must satisfy the constraint that b(x) > — b(— x) for all i£l 
This constraint is required to ensure that the upper endpoint of this confidence 
interval is never less than the lower endpoint. This particular form of confidence 
interval is motivated by some invariance arguments. The standard 1 — a confidence 
interval for 9 is 

[Y - t(n - l)t/y/n, Y + t(n- l)t/y/n\ 

where the quantile t(m) is defined by the requirement that P(—t(m) <T< t(m)) = 
1 — a for T ~ t m . Note that this confidence interval can be expressed in the form 
D. 

Now scale the confidence interval D by the same scaling factor as before, to 
obtain 

D* = ^-D = \-Rb(-X/R), Rb{X/R)}. 
a 

Note that Oh £ D is equivalent to G D*. Similar statements apply to the other 
estimators 0s, 0,4 and ©c- Also note that D* is not a confidence interval for if), 
since it depends on the unknown parameter a. However, 

PeA® e D) = P^ip e D*), 

so that 

inf iV(0 eD) = inf P^ip G D*). 

6,(7 ' ip 

Also, 

E 0j(T (length of D) ^(length of D*) 



E ei(T (length of standard 1 - a CI for 9) 2t(n - 1)E(R) 
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Since R and X are independent and R converges in probability to 1 (as n — > oo) it 
is plausible that, for large n, the statistical properties of D* are well-approximated 
by the corresponding properties of C* = [—b(—X),b(X)]. In fact, the following 
result holds. 

Theorem 2. Suppose that the function b satisfies the following assumptions. 

(Al) The function b is continuous and strictly increasing. Also, the function 6 _1 is 
uniformly continuous. 

(A2) Define e(x) = b(x) — x — z, where the quantile z is defined by the requirement 
that P(-z < Z < z) = 1 - a for Z ~ N(0, 1). 

(i) e(x) = for all \x\ > q, where q is a specified positive number. 

(ii) There exists L, satisfying < L < oo, such that \e(x) — e(y)\ < L\x — y\ 
for all x and y. 

Then 



(Rl) sup \P^(ip G C*) - P^ip G D*)\ — > os n -> oo. 



(R2) sup 



E^length of ' C*) E '^(length of ' D*) 



— > as n oo. 



2z 2t(n - l)E(R) 

Proof. We prove the result (Rl) as follows. Note that 

P^ G C*) = 1 — P^ < -K-X)) - P^ > b(X)) 

P^ G D*) = 1 — P^ < -Rb{-X/R)) - P^ > Rb{X/R)). 

It is sufficient to prove that 

sup \Pj,(if> < -b(-X)) - P^(ijj < -Rb{-X/R))\ as n ^ oo (3) 
sup \P^(ip > b{X)) -P^(ip> Rb(X/R)) ^0 asn^oo (4) 
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The proofs of ([3]) and (J4]) are very similar. For the sake of brevity, we provide only 
the proof of (j4j). Suppose that e > is given. We need to prove that there exists 
N < oo such that 

sup IP^V* > b(X)) - P^tfj > Rb(X/R))\ < e for all n > N. (5) 

Let S (0 < S < 1/2) be given. Using the law of total probability, it may be shown 
that 

|P^(V > Rb(X/R)) -P^> b(X))\ 

< |P^(V> > Rb(X/R), \R-1\<S) - P^(ip > b(X))\+ P(\R-1\ > S). (6) 
Obviously, 

P^ > Rb(X/R), \R- 1| < 5) 

= > 6(X) + (R - l)(z + e{X/R)) + (e(X/R) - e(X)), \R-1\<S). (7) 

It may be shown that if \R — 1| < S then there exists M < oo (where M does not 
depend on S) such that \(R-l)(z + e(X/R)) + (e(X/R) - e(X))\ < MS. Thus 

P f (ip > b(X) + MS, \R - 1| < S) < (ED < F^(V> > b(X) - MS). 

Using the law of total probability, it may be shown that 

P i) (ip>b(X) + MS,\R-l\ <S) < © > P^(ip > b(X) + MS) -P(\R-1\ >S). 
Thus 

P^(tp > b(X) + MS) -P(\R-1\ >S) < © < P^(ip > b(X) -MS). 
In other words, 

P 1 p(X < b-^tfj-MS)) -P(\R-1\ >S) < © < P 4 ,(X < b~\i) + MS)). 
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Note that P^(ip > b(X)) — Pf(X < b^ijp)). Using the uniform continuity of b^ 1 
and the fact that X ~ N(ip, 1), it may be shown that there exists S (0 < 5 < 1/2) 
such that 

sup I P^ (X < b~ l (ip - M5)) -P i ,(X < b- 1 (ip))\< e/2 

1> 

sup|P^(X < b-\i) + M5)) -PJX < fo— 1 (-0)) I < e/2- 

V' 

Choose 5 (0 < 5 < 1/2) such that these two inequalities are satisfied. Therefore, 
|(C1) - P^O > b(X))\ < P(\R - 1| > 5) + e/2. It follows from © that \P^(ip > 
Rb(X/R)) -P^ > b(X))\ < 2P(\R-1\ > 5) + e/2. Since P(\R-1\ > 5) as 
n — > oo, there exists N < oo such that ([5]) is satisfied. This completes the proof of 
the result (Rl). 

We prove the result (R2) as follows. It may be shown that it is sufficient to prove 
that 

sup |P,/,(length of C*) — ^(length of D*) \ — > as n — > oo. 

Now 

^(length of D*) - P^ (length of C*) 

= 2z(E(R) - 1) + (^(length of D*) - 2zE(R)) - (^(length of <T) - 2z) . 
Hence 

|P^(length of D*) - ^(length of C*)\ 

= 2z\E(R) - 1| + ^(length of D *) ~ 2zE(R)) - (^(length of C*) -2z)\. 

Since E(R) does not depend on ip and E(R) — > 1 as n — > oo, it is sufficient to prove 
that 

sup ^(length of D*) - 2zE(R)) - (^(length of C*) - 2z) | as n ^ oo. 
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Let fn denote the probability density function of R. Now 
^(length of D*) - 2zE(R) 

noo 
( b (~) + 6 (~r) ~ 2z ) ~ip)dxr f R (r)dr 

+ b ^— — j — 2z^j <p(x — ip) dxr fu(r) dr 



'0 J-OQ 

f-oo rrq 




'0 J-rq x ^ r 

since b(x/r) + b(—x/r) —2z = for all |x| > rq. Changing the variable of integration 
from x to y = x/r, we see that (jHJ) is equal to 

oo pq 

(b (x) + b (—x) — 2z) <p{rx — ip) dx r 2 fn(r) dr 




Now 



^(length of C*)-2z 

(b (x) + b (— x) — 2z) <p(x — ip) dx 

{b (x) + b (-x) - 2z) (f>(x - ip) dx (by (A2)(i)) 

(b (x) + b (—x) — 2z) (j)(x — ip) dx r 2 fn(r) dr, (9) 



-oo 

g 




since 



Thus 



oo 

r 2 f R (r)dr = E(R 2 ) = 1. 



^(length of D*) - 2zE(R) - (^(length of C*) - 2z] 

(e(x) + e(—x))(<p(rx — ip) — (p(x — ip)) dx r 2 fn(r) dr (10) 



oo i>q 




'0 J-q 

By the mean-value theorem, there exists a positive number K < oo such that 
\(f)(rx — ip) — (f)(x — ip) | < K\r — l||x| for all r > and xGl 

Thus 



ED | < 4LKq 3 E(\R-l\R 2 ). 
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Note that E(\R — l\R 2 ) does not depend on ip and that, by the Cauchy-Schwarz 
inequality, E(\R - 1\R 2 ) ->■ as n — > oo. This completes the proof of (R2). 

□ 

Thus, to study the coverage and expected length properties of the con- 
fidence interval D for 6 when n is large, we study the properties of 
Pi/,(ip £ C*) and ^(length of C*), which are simply functions of ijj. 

Now suppose that the "error variance is known" i.e. a 2 is known. The analogues 
of the estimators 9#, 65, 9^ and 9c are 9#, G5, 9^ and 9c respectively, where 



e 



H 



if \Y\ < ar] n 
Y if \Y\ >a Vn 



-max{|y| - ar] n ,0} if Y < 

{ o ify = o 

max{|F| - ar] n ,0} if Y > 

if\Y\<arj n 
&a = < - a 2 ri 2 

X\Y\><jr] n 

'sign(y)(|y|-o77 n ) + ii\Y\<2arj n 

&c = ^ ((a - 1)? - sign(y)ao-?] n )/(a - 2) if 2<r?7 n < |F| < acr^ 

y if |y| > aar/ n 

where a = 3.7. Also, the analogue of the confidence interval D for 6 is 



Scaling 0, 9#, 9^, 9^, 9c and C by multiplying by y/n/cr, we obtain ■?/>, 
\j>s, ^^4, and C*, respectively. In other words, when we suppose that the "error 
variance is known" , we are finding an approximation (by the arguments stated earlier 
in this section) to the coverage probability and expected length properties of 9#, 
9s, 9,4, 9c and D for large n. 
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